Spatial Econometrics & Outlook

Stefan Jünger & Dennis Abel

2025-04-10

Now

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping I
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Wrangling
April 10 09:00-10:30 Mapping II
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Autocorrelation
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

What are spatial econometrics?

Econometrics could be reduced to using statistics to model (complex) theories …

  • It is interesting for causal inference and thinking
  • As default, we think about regression analysis

Therefore, spatial econometrics combines spatial analysis and econometrics.

  • Study of why spatial relationships (i.e., autocorrelation) exist
  • How spatial autocorrelation affects our outcome of interest

What is the data generation process?

Spatial diffusion vs. spatial spillover

There are at least two common mechanisms in which we are interested in spatial econometrics.

Diffusion

  • \(y_i\) affects \(y_j\) through \(w_{ij}\)
  • \(y_j\) affects \(y_i\) through \(w_{ji}\)
  • that’s a feedback effect
    • endogenous by design!
  • Examples:
    • Pandemic and policy measures to contain the pandemic
    • Diffusion of violence in a war

Spillover - \(x_i\) affects \(y_j\) through \(w_{ij}\) - \(x_j\) affects \(y_i\) through \(w_{ij}\) - Examples: - Spillover of economic strength and trade

Let’s have another look at our chessboard

We must think about theories and mechanisms and how they translate into spatial effects and the data generation process.

That said, there are tests to check for the specific data generation process, but they are not recommended to be used naively.

Is it meaningful or just nuisances?

Space can be important in our analysis in two ways.

  • It’s meaningful in our theory, and we thus interpret it accordingly after estimation
  • It can distort our empirical estimates, producing bias, inconsistency, and inefficiency

We can address these different perspectives in our analysis with spatial econometric methods.

Formulas… models, models, models

Linear Regression:

\[Y = X\beta + \epsilon\]

Spatial Lag Y / Spatial Autoregressive Model (SAR, Diffusion):

\[Y = \rho WY + X\beta + \epsilon\]

Spatial Lag X Model (SLX, Spillover):

\[Y = X\beta + WX\theta + \epsilon\]

Spatial Error Model (SEM):

\[Y = X\beta + u\] \[u = \lambda Wu + \epsilon\]

Flavors and extensions

Spatial Durbin Model:

\[Y = \rho WY + X\beta + WX\theta + \epsilon \]

Spatial Durbin Error Model:

\[Y = X\beta + WX\theta + u\] \[u = \lambda Wu + \epsilon\]

Combined Spatial Autocorrelation Model:

\[Y = \rho WY + X\beta + u\] \[u = \lambda Wu + \epsilon\]

Manski Model:

\[Y = \rho WY + WX\theta + X\beta + u\] \[u = \lambda Wu + \epsilon\]

Source:Tenor

Intermediate summary

There are a lot of models you could estimate to explain spatial autocorrelation. And there’s a vast body of literature on the best choice for which application.

We’d explicitly like to recommend the work of Tobias Rüttenauer for us social scientists. Here are some really nice workshop materials.

In this session, we will only estimate Spatial Lag Y and X and Spatial Error Models.

‘Research’ question and data

We will use the same example as in the previous session. But this time, we will test if one of our spatial regression models helps further investigate the data generation process. We may ask:

  1. Do immigrant shares affect AfD voting shares within voting districts?
  2. Do immigrant shares affect AfD voting shares between neighborhoods? (=spillover)
  3. Do AfD voting shares affect AfD voting shares between neighborhoods? (=diffusion)

Controlling inhabitant numbers within the voting districts might also be a good idea.

Linear regression

linear_regression <-
  lm(afd_share ~ immigrant_share + inhabitants, data = election_results)

summary(linear_regression)

Call:
lm(formula = afd_share ~ immigrant_share + inhabitants, data = election_results)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.010  -3.397  -0.232   2.790  25.032 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     27.737242   0.579582  47.857  < 2e-16 ***
immigrant_share -0.097675   0.026150  -3.735 0.000207 ***
inhabitants     -0.079595   0.003812 -20.879  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.843 on 540 degrees of freedom
Multiple R-squared:  0.4822,    Adjusted R-squared:  0.4803 
F-statistic: 251.4 on 2 and 540 DF,  p-value: < 2.2e-16

Now we need a spatial weight

Once again, we have to construct a spatial weight as in the analysis of spatial autocorrelation to estimate a spatial regression. In fact, we’ll use the same approach as before.

queen_neighborhoods <- spdep::poly2nb(election_results, queen = TRUE)

queen_W <- spdep::nb2listw(queen_neighborhoods, style = "W")

Spatial Error Model: If we want to control nuisance

spatial_error_model <-
  spatialreg::errorsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W
    )

summary(spatial_error_model)

Call:spatialreg::errorsarlm(formula = afd_share ~ immigrant_share + 
    inhabitants, data = election_results, listw = queen_W)

Residuals:
     Min       1Q   Median       3Q      Max 
-9.45189 -2.38063 -0.41255  1.94994 25.74532 

Type: error 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error z value  Pr(>|z|)
(Intercept)     22.9287086  0.9506456 24.1191 < 2.2e-16
immigrant_share -0.0900839  0.0281413 -3.2011  0.001369
inhabitants     -0.0333569  0.0046216 -7.2175 5.294e-13

Lambda: 0.764, LR test value: 217.73, p-value: < 2.22e-16
Asymptotic standard error: 0.03331
    z-value: 22.936, p-value: < 2.22e-16
Wald statistic: 526.05, p-value: < 2.22e-16

Log likelihood: -1516.68 for error model
ML residual variance (sigma squared): 13.541, (sigma: 3.6798)
Number of observations: 543 
Number of parameters estimated: 5 
AIC: 3043.4, (AIC for lm: 3259.1)

Spatial Lag X Model: estimating spillovers

spatial_lag_x_model <-
  spatialreg::lmSLX(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W
  )

summary(spatial_lag_x_model)

Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
    data = as.data.frame(x), weights = weights)

Coefficients:
                     Estimate     Std. Error   t value      Pr(>|t|)   
(Intercept)            3.062e+01    6.777e-01    4.517e+01   3.153e-185
immigrant_share       -7.937e-02    3.450e-02   -2.301e+00    2.178e-02
inhabitants           -2.496e-02    5.910e-03   -4.223e+00    2.827e-05
lag.immigrant_share   -1.358e-02    4.769e-02   -2.848e-01    7.759e-01
lag.inhabitants       -8.721e-02    7.721e-03   -1.130e+01    1.073e-26

Spatial Lag Y Model: estimating diffusion

spatial_lag_y_model <-
  spatialreg::lagsarlm(
    afd_share ~ immigrant_share + inhabitants,
    data = election_results,
    listw = queen_W)

summary(spatial_lag_y_model)

Call:spatialreg::lagsarlm(formula = afd_share ~ immigrant_share + 
    inhabitants, data = election_results, listw = queen_W)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.1439  -2.2643  -0.2712   1.9608  24.2918 

Type: lag 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error z value Pr(>|z|)
(Intercept)     10.1883400  0.9871671 10.3208  < 2e-16
immigrant_share -0.0549475  0.0197113 -2.7876  0.00531
inhabitants     -0.0329164  0.0034473 -9.5484  < 2e-16

Rho: 0.66913, LR test value: 261.29, p-value: < 2.22e-16
Asymptotic standard error: 0.035258
    z-value: 18.978, p-value: < 2.22e-16
Wald statistic: 360.16, p-value: < 2.22e-16

Log likelihood: -1494.896 for lag model
ML residual variance (sigma squared): 13.024, (sigma: 3.6089)
Number of observations: 543 
Number of parameters estimated: 5 
AIC: 2999.8, (AIC for lm: 3259.1)
LM test for residual autocorrelation
test value: 20.507, p-value: 5.9418e-06

Comparison: What’s ‘better’?

AIC(spatial_error_model, spatial_lag_x_model, spatial_lag_y_model)
                    df      AIC
spatial_error_model  5 3043.360
spatial_lag_x_model  6 3146.285
spatial_lag_y_model  5 2999.792
spdep::lm.LMtests(linear_regression, queen_W, test = c("LMerr", "LMlag"))

    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = afd_share ~ immigrant_share + inhabitants, data =
election_results)
test weights: listw

RSerr = 206.05, df = 1, p-value < 2.2e-16


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = afd_share ~ immigrant_share + inhabitants, data =
election_results)
test weights: listw

RSlag = 308.15, df = 1, p-value < 2.2e-16

Let’s stick to our theory, shall we?

Of higher importance: interpretation

Unfortunately, in a Spatial Lag Y Model, the spatial parameter \(\rho\) only tells us whether the effect is (statistically) significant.

  • Remember: these models are endogenous by design
    • We have effects of \(y_j\) on \(y_i\) and vice versa
    • What a mess

Luckily, there’s a method to decompose the spatial effects into direct, indirect, and total effects: estimating impacts

Impact estimation in R

This time, let’s start with the Spatial Lag Y Model:

spatialreg::impacts(spatial_lag_y_model, listw = queen_W)
Impact measures (lag, exact):
                     Direct    Indirect       Total
immigrant_share -0.06185718 -0.10421054 -0.16606773
inhabitants     -0.03705569 -0.06242758 -0.09948327

Compare it to the ‘simple’ regression output:

coef(spatial_lag_y_model)
            rho     (Intercept) immigrant_share     inhabitants 
     0.66912619     10.18833999     -0.05494746     -0.03291641 

Spatial Lag X impacts

spatialreg::impacts(spatial_lag_x_model, listw = queen_W)
Impact measures (SlX, glht):
                     Direct    Indirect       Total
immigrant_share -0.07937153 -0.01358185 -0.09295338
inhabitants     -0.02495873 -0.08720666 -0.11216539

Compare it to the ‘simple’ regression output:

coef(spatial_lag_x_model)
        (Intercept)     immigrant_share         inhabitants lag.immigrant_share 
        30.61564892         -0.07937153         -0.02495873         -0.01358185 
    lag.inhabitants 
        -0.08720666 

If you need p-values and stuff

spatialreg::impacts(spatial_lag_y_model, listw = queen_W, R = 500) |> 
  summary(zstats = TRUE, short = TRUE)
Impact measures (lag, exact):
                     Direct    Indirect       Total
immigrant_share -0.06185718 -0.10421054 -0.16606773
inhabitants     -0.03705569 -0.06242758 -0.09948327
========================================================
Simulation results ( variance matrix):
========================================================
Simulated standard errors
                     Direct    Indirect       Total
immigrant_share 0.021080283 0.037089940 0.056999055
inhabitants     0.003488308 0.008371244 0.009850455

Simulated z-values:
                    Direct  Indirect      Total
immigrant_share  -2.908022 -2.830335  -2.917222
inhabitants     -10.629987 -7.599456 -10.222631

Simulated p-values:
                Direct     Indirect   Total     
immigrant_share 0.0036372  0.0046499  0.0035316 
inhabitants     < 2.22e-16 2.9754e-14 < 2.22e-16

Exercise 2_3_2: Spatial Regression

Exercise

Solution

Outlook

This week

Day Time Title
April 09 10:00-11:30 Introduction
April 09 11:30-11:45 Coffee Break
April 09 11:45-13:00 Data Formats
April 09 13:00-14:00 Lunch Break
April 09 14:00-15:30 Mapping I
April 09 15:30-15:45 Coffee Break
April 09 15:45-17:00 Spatial Wrangling
April 10 09:00-10:30 Mapping II
April 10 10:30-10:45 Coffee Break
April 10 10:45-12:00 Applied Spatial Linking
April 10 12:00-13:00 Lunch Break
April 10 13:00-14:30 Spatial Autocorrelation
April 10 14:30-14:45 Coffee Break
April 10 14:45-16:00 Spatial Econometrics & Outlook

What’s left

Other map types, such as

  • Cartograms
  • Hexagon maps
  • (more)animated maps
  • Network graphs

GIS techniques, such as

  • Geocoding
  • Routing
  • Cluster analysis

More Advanced Spatial(-temporal) Modeling

More data sources…

Data Sources

Some more information:

  • Geospatial data are interdisciplinary
  • Amount of data feels unlimited
  • Data providers and data portals are often specific in the area and/or the information they cover

Some random examples:

The End